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DETAILED ACTION 

Response to Amendment 

1. Applicant's arguments with respect to claims 1-18 and 23-33 have been considered but 
are moot in view of the new ground(s) of rejection. 

2. Applicant's arguments, see pages 8-10, filed August 24, 2005, with respect to the 
rejection(s) of claim(s) 1-18 and 23-33 under 35 U.S.C. 103(a) have been fully considered and 
are persuasive. Therefore, the rejection has been withdrawn. However, upon further 
consideration, a new ground(s) of rejection is made in view of Joffe (US006330584B1). 

3. Applicant argues that Blelloch (US005768594A) does not teach an interleaver for 
interleaving instructions from the plurality of programs (pages 8-9). 

In reply, the Examiner agrees. However, new grounds of rejection are made in view of 

Joflfe. 

Claim Rejections - 35 USC § 102 

4. The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the 
basis for the rejections under this section made in this Office action: 

A person shall be entitled to a patent unless - 

(e) the invention was described in (1) an apphcation for patent, pubUshed under section 
122(b), by another filed in the United States before the invention by the applicant for 
patent or (2) a patent granted on an application for patent by another filed in the United 
States before the invention by the applicant for patent, except that an international 
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application filed under the treaty defined in section 35 1(a) shall have the effects for 
purposes of this subsection of an appUcation filed in the United States only if the 
international application designated the United States and was published under Article 
2 1 (2) of such treaty in the English language. 

5. Claims 25-27 are rejected under 35 U.S.C. 102(e) as being anticipated by Joflfe. 

6. With regard to Claim 25, Jofife discloses a programmable processor (160, Figure 1) for 
executing a plurality of programs (pipelined multi- tasking processor (microcontroller) 160, Col. 
3, lines 40-42; processor executes several tasks. Col. 2, lines 66-67; load programs into the 
microcontroller for execution. Col. 5, lines 48-51). According to the disclosure of this 
appUcation, the defmition of "program" is an operation performed on data. For example, 
operations on different data represent different programs (page 8, line 5). Joflfe describes that 
each task is an operation performed on data {each task processes a separate frame of data. Col. 
2, lines 41-42), and therefore the tasks are programs. The programmable processor comprises a 
target program counter (320, Figure 5) coupled to a plurality of program counters (315) (Col. 9, 
lines 56-61; Col. 2, lines 1 1-13); each of the plurality of program counters coupled to an 
instruction memory (3 14; Col. 10, Unes 1-5); instructions from the instruction memory coupled 
to an instruction decode (Col. 10, Unes 1-7); the decode coupled to a pluraUty of registers (3 12, 
Col. 10, Unes 6-12); each of the plurality of registers coupled to an operand route; the operand 
route coupled to an arithmetic datapath (318; Col. 10, Unes 8-12); the datapath and an output of a 
data memory (3 12) coupled to a result route; and an output of the result route fed back to each of 
the plurality of registers (Col. 9, lines 26-28), as shown in Figure 5. 
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7. With regard to Claim 26, Joflfe discloses that the plurality of program counters is equal to 
the plurality of programs to be interleaved (each task has a separate program counter register. 
Col 2, lines 11-13; Col. 2, lines 29-34). 

8. With regard to Claim 27, Jofife discloses that the pluraUty of registers is equal to the 
plurality of programs to be interleaved {separate registers for each task. Col. 2, lines 8-1 1). 

9. Thus, it reasonably appears that Joflfe describes or discloses every element of Claims 25- 
27 and therefore anticipates the claims subject. 

Claim Rejections - 35 USC §103 

10. The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 

obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or 
described as set forth in section 102 of this title, if the differences between the subject 
matter sought to be patented and the prior art are such that the subject matter as a whole 
would have been obvious at the time the invention was made to a person having ordinary 
skill in the art to which said subject matter pertains. Patentability shall not be negatived 
by the manner in which the invention was made. 

11. The factual inquiries set forth in Gra^a/wv.Jo///? Deere Ca, 383 U.S. 1, 148USPQ459 
(1966), that are applied for estabUshing a background for determining obviousness under 35 
U.S.C. 103(a) are summarized as follows: 

1 . Determining the scope and contents of the prior art. 

2. Ascertaining the differences between the prior art and the claims at issue. 

3. Resolving the level of ordinary skill in the pertinent art. 
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4. Considering objective evidence present in the application indicating obviousness 
or nonobviousness. 

12. Claims 1-7, 9-11, 14-17, 23, 24, and 33 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over Joffe (US006330584B1) in view of Krishna (US006161173A), 

13. With regard to Claim 1, JofFe describes a programmable processor (160, Figure 1) for 
executing a plurality of tasks (pipelined multi-tasking processor (microcontroller) 160, Col. 3, 
lines 40-42; processor executes several tasks. Col. 2, lines 66-67; load programs into the 
microcontroller for execution. Col. 5, Unes 48-51). According to the disclosure of this 
appUcation, the defmition of "program" is an operation performed on data. For example, 
operations on different data represent different programs (page 8, line 5). Joffe describes that 
each task is an operation performed on data {each task processes a separate frame of data. Col. 
2, lines 41-42), and therefore the tasks are programs. The programmable processor comprises an 
execution pipeline (Col. 3, Unes 40-42); and an interleaver for interleaving instructions from the 
plurality of programs and providing the instructions to the pipeline for execution (Col. 2, lines 
29-34). Joffe describes that more than one task for each data flow are provided to the pipeline, 
which allows the pipeline to start processing the next frame before the processmg of the previous 
frame in the same data flow is completed (Col. 7, lines 44-49; Col. 8, lines 8-15) and therefore 
the number of the plurality of programs that are interleaved is greater than or equal to the depth 
of the pipeline. 

However, Joffe does not teach that the execution pipeline has an average pipeline latency 
of one instruction per cycle. However, Krishna discloses the goal of achieving an average 
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pipeline latency of one clock cycle (a main scheduler schedules execution of operations and 
allots a single clock cycle,., even though the execution unit is unable to execute some instructions 
in a single clock cycle... local,,, circuitry controls execution pipelines having latency of two or 
more clock cycles^ Col. 2, lines 35-67) although it is not always possible to do so. 

It would have been obvious to one of ordinary skill in the art at the time of invention by 
applicant to modify the device of Joffe so that the execution pipeline has an average pipeline 
latency of one instruction per cycle as suggested by Krishna because it results in a more 
streamlined pipeline operation and simplified design (Krishna, Col. 2, lines 60-67). 

14. With regard to Claim 2, Joffe describes that the pipeline has a datapath with a depth equal 
to the number of programs (Col. 1, lines 62-65). 

15. With regard to Claim 3, Joffe describes that a next instruction from one of the plurality of 
programs is not provided to the pipeline until a previous instruction of the one of the plurality of 
programs has completed {the task does not get access to the same resource until after every other 
task sharing the resource has finished accessing the resource^ Col. 2, lines 35-39). 

16. With regard to Claim 4, Joffe describes that each program of the plurality of programs is 
independent of the other of the plurality of programs (Col. 1, line 62-Col. 2, line 7). 

17. With regard to Claim 5, Joffe describes interleaving instructions (Col. 2, lines 29-34), and 
therefore the instructions are executed out of order. 
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However, JofFe does not teach an output buffer for storing out of order data output. 
However, Krishna discloses that the execution engine (140, Figure 1) has an out-of-order 
architecture (Col. 5, lines 11-12), and the scheduler (150) receives results from the execution 
units (170, 175, 180) and stores the results (Col. 5, lines 28-35). Therefore, Krishna inherently 
discloses an output buffer for storing out of order data output. 

It would have been obvious to one of ordinary skill in the art at the time of invention by 
applicant to modify the device of Joffe to include an output buffer for storing out of order data 
output as suggested by Krishna because Krishna suggests that since the instructions are executed 
out of order (Col. 5, lines 11-12), the output buffer is needed to store the out of order data output 
so that the data can put in the correct order (Col. 5, lines 28-35). 

18. With regard to Claim 6, Joffe describes one or more of a register copy, program counter, 
and program counter stack provided for each of the plurality of programs {separate registers for 
each task, each task has a separate program counter (PC), Col. 2, lines 8-13). 

19. With regard to Claim 7, Joffe describes that one or more of control and computing 
resources, instructions, instruction memory, data paths, data memory, and caches are shared by 
the plurality of programs (multiple tasks share one or more resources, Col. 2, lines 35-39). 

20. With regard to Claim 9, Joffe describes that the instructions comprise load instructions 
for loading data from a data memory {load/store unit 330 queues load and store requests to load 
a register from memory or to store register contents to memory. Col. 9, lines 35-41). 
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21. With regard to Claim 10, Joffe describes that the instructions comprise store instructions 
for storing data in a memory (Col. 9, lines 35-41). 

22. With regard to Claim 1 1, it would have been obvious to one of ordinary skill in the art at 
the time of invention by applicant to have data memory comprising a cache because it provides 
for faster execution of programs in a processor system. 

23. With regard to Claim 14, Joffe describes a method of executing instructions from a 
plurality of programs comprising identifying N programs of the plurality of programs (Col. 2, 
lines 11-14, 66-67); interleaving instructions from the N programs in a processor pipelme (160, 
Figure 1; Col. 2, Unes 29-34; Col. 3, lines 40-42); and executing the instructions such that a first 
instruction from one of the N programs is completed before beginning execution of a second 
instruction of the one of the N programs (Col. 2, Unes 35-39) 

However, Joffe does not teach that the pipeline has an average latency of one instruction 
per cycle and checking that no no-op is inserted into the pipeline for the purpose of ensuring that 
the first instruction is completed before beginning execution of the second instruction. However, 
Krishna discloses the goal of achieving an average pipeline latency of one clock cycle (Col. 2, 
lines 35-67) although it is not always possible to do so. This would be obvious for the same 
reasons given in the rejection for Claim 1. Krishna also describes that the local scheduling 
circuitry stops the main scheduler from issuing a selected operation if the latency of another 
operation would create a conflict with the main scheduler issuing the selected operation (Col. 2, 
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lines 56-60). Therefore, it is ensured that the first instruction is completed before beginning 
execution of the second instruction. The operation is executed if no no-op is inserted into the 
pipeline (information in each entry describes either no-op or an associated operation which is to 
be executed, Col. 5, lines 36-38; operation pipelines. Col. 2, lines 41-45). Therefore, when no 
no-op is inserted into the pipeline, this ensures that the first instruction is completed before 
beginning execution of the second instruction. 

It would have been obvious to one of ordinary skill in the art at the time of invention by 
appUcant to modify the device of Jofife to include checking that no no-op is inserted into the 
pipeline for the purpose of ensuring that the first instruction is completed before beginning 
execution of the second instruction as suggested by Krishna because Krishna suggests that a no- 
op is needed in order to indicate that the first instruction has not yet completed (Col. 5, lines 26- 
38). 

24. With regard to Claims 15 and 16, they are similar in scope to Claim 6, and therefore are 
rejected under the same rationale. 

25. With regard to Claim 17, Claim 17 is similar in scope to Claim 2, and therefore is 
rejected under the same rationale. 



26. With regard to Claim 23, Claim 23 is similar in scope to Claim 14, and therefore is 
rejected under the same rationale. 
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27. With regard to Claim 24, Claim 24 is similar in scope to Claim 1, and therefore is 
rejected under the same rationale. 

28. With regard to Claim 33, Joffe describes a method of executing one or more instructions 
from a plurality of programs (Col. 2, lines 66-67), comprising assigning a first output register 
slot to a first of the plurality of programs (Col. 2, lines 8-11); executing the one or more 
instructions of the first program until the first program is completed; loading output of the first 
program into its reserved space when the first program is completed (Col, 9, lines 26-41); 
checking to see if all of the plurality of programs are completed; checkmg to see if a second 
output register slot is available to assign to a second program from the plurality of programs 
when the first program is completed; checking to see if one or more instructions are available 
when at least one of the plurality of programs is not completed (Col. 2, lines 35-39). 

However, Joffe does not teach placing a no-op when no more instructions are available or 
the second output register slot is not available. However, Krishna discloses information in each 
entry describes either no-op or an associated operation which is to be executed (Col. 5, lines 36- 
38). Therefore, when there is a no-op, that means that no more instructions are available. This 
would be obvious for the same reasons given in the rejection for Claim 14. 

29. Claims 8 and 18 are rejected under 35 U.S. C. 103(a) as being unpatentable over Joffe 
(US006330584B1) and Krishna (US006161 173A) in view of Nguyen (US005961628A). 
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30. With regard to Claim 8, Joffe and Krishna are reUed upon for the teachings as discussed 
above relative to Claim 1. The Joffe-Krishna combination implicitly discloses SIMD execution 
of vector instructions without addressing vector lengths. 

However, Joffe and Krishna do not explicitly disclose that the processor executes SIMD 
vector instructions of vector length N and executes in parallel a pluraUty of instructions having 
SIMD vector lengths that sum up to N, However, Nguyen explicitly discloses that the processor 
executes SIMD vector instructions of vector length N and executes in parallel a plurality of 
instructions having SIMD vector lengths that sum up to N (Col. 1, lines 1 1-24; Col. 53-60). 

It would have been obvious to one of ordinary skill in the art at the time of invention by 
appUcant to modify the devices of Joffe and Krishna so that the processor executes SIMD vector 
instructions of vector length N and executes in parallel a plurality of instructions having SIMD 
vector lengths that sum up to N as suggested by Nguyen because it provides a way to reduce 
processing time for repetitive tasks (Col. 1, lines 10-25). 

3 1 . With regard to Claim 18, Claim 18 is similar in scope to Claim 8, and therefore is 
rejected under the same rationale. 

32. Claims 12 and 13 are rejected under 35 U.S. C. 103(a) as being unpatentable over Joffe 
(US006330584B1) and Krishna (US006161 173A) in view of Narayanaswami (US005973705A). 

Joffe and Krishna are relied upon for the teachings as discussed above relative to Claim 

9. 
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However, JofFe and Krishna do not disclose that address space of the data memory 
comprises a frame buffer unit and a texture memory unit. However, Narayanaswami discloses 
explicitly a SIMD graphics processing system comprising a frame buffer unit (frame buffer 1 lOf, 
Figure 2A) while impUcitly suggesting a texture memory unit. 

It would have been obvious to one of ordinary skill in the art at the time of invention by 
apphcant to modify the devices of Joflfe and Krishna so that address space of the data memory 
comprises a frame buffer unit and a texture memory unit as suggested by Narayanaswami 
because it provides a way to reduce processing time (Col. 2, lines 20-22). 

33. Claims 28-3 1 are rejected under 35 U.S.C. 103(a) as being unpatentable over Joflfe 
(US006330584B1) in view of Akkary (US006493820B2). 

34, With regard to Claim 28, Joflfe is relied upon for the teachings as discussed above relative 
to Claim 25. 

However, Joflfe does not explicitly teach that the plurality of registers is more than the 
plurality of programs to be interleaved. However, Akkary discloses that trace buffers may be the 
same as or different than the number of program counters and that these buffers may be single 
memory divided into individual trace buffers or physically separate trace buffers or some 
combination of the two; and each program counter is associated with a particular thread ID and 
trace buffer and also that there is not such a restricted relationship (Col. 8, hnes 5-15); and 
dependency generation and decoding circuitry 21 8 A could include multiple dependency fields 
and registers (Col. 15, lines 44-58). 
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It would have been obvious to one of ordinary skill in the art at the time of invention by 
appUcant to modify the device of Joflfe so that the plurahty of registers is more than the plurality 
of programs to be interleaved as suggested by Akkary because it helps in increased throughput. 

35. With regard to Claims 29, 30, and 31, the concept of more resources available than 
required would result in increased throughput and double-buffering is a well-known scheme to 
avoid waiting for resources to carry data processing in an efficient manner. This is similar in 
scope to Claim 28 above and Claims 29 and 30 are rejected under the same rationale, 

36. Claim 32 is rejected under 35 U.S.C. 103(a) as being unpatentable over Narayanaswami 
(US005973705A) in view of Krishna (US006161173A). 

According to the disclosure of this application, a complex instruction includes operations 
such as matrix multiply, vector normalization, trigonometric functions, and exponentiation (page 
21, lines 1-4). Narayanaswami discloses a method of executing one or more complex or 
compound instructions from a plurahty of programs, comprising implementing the instructions in 
one or more pipelined units (Col. 2, lines 35-63). 

However, Narayanaswami does not teach that each of the instructions is issued to the one 
or more units in each cycle. However, Krishna discloses the goal of achieving an average 
pipeline latency of one clock cycle (Col. 2, lines 35-67) although it is not always possible to do 
so. Therefore, each of the instructions is issued to the one or more units in each cycle. This 
would be obvious for the same reasons given in the rejection for Claim 1 . 
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Conclusion 

The prior art made of record and not relied upon is considered pertinent to applicant's 
disclosure. The following prior art teach SIMD processing and execution of pipelines in 
superscalar processors. 



U.S. 


Patent No. 


6,470,445 Bl to Arnold et al. 


U.S. Patent No. 


5,420,990 to McKeen et al. 


U.S. 


Patent No. 


6,064,818 to Brown etal. 


U.S. Patent No. 


5,428,807 to McKeen et al. 


U.S. 


Patent No. 


6,282,635 to Sachs 


U.S. Patent No. 


5,802,386 to Kahle at al. 


U.S. 


Patent No. 


5,949,996 to Atsushi 


U.S. Patent No. 


6,209,078 to Chiang et al. 


U.S. 


Patent No. 


5,548,737 to Edrington et al. 


U.S. Patent No. 


6,412,061 to Dye 


U.S. 


Patent No. 


5,809,552 to Kuroiwa et al. 


U.S. Patent No. 


6,508,862 to Joy etal. 



In particular, U.S. Patent No. 6,507,862 to Joy et al. discloses vertical and horizontal 
threaded processors. Joy et al. discloses a single pipeline shared among a plurality of machine 
states or threads, a thread that is currently active, not stalled, is selected and supplied data or 
functional blocks connected to the pipeline; when active thread is stalled, the pipeline 
immediately switches to a non-stalled thread, and begins executing the non-stalled thread (Col. 6, 
lines 10-40; Col 8, lines 15-60). 

Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Joni Hsu whose telephone number is 571-272-7785. The 
examiner can normally be reached on M-F 8am-5pm. 
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If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Ulka Chauhan can be reached on 571-272-7782. The fax phone number for the 
organization where this application or proceeding is assigned is 571-273-8300, 

Information regarding the status of an application may be obtained from the Patent 
Application Information Retrieval (PAIR) system. Status information for pubUshed applications 
may be obtained from either Private PAIR or Pubhc PAIR. Status mformation for unpublished 
apphcations is available through Private PAIR only. For more information about the PAIR 
system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR 
system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). 



JH 



ULKA CHAUHAN 
SUPERVISORY PATENT EXAMINER 



